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Abstract —The one-bit quantization can be implemented by 
one single comparator, which operates at low power and a 
high rate. Hence one-bit compressive sensing ( lbit-CS ) becomes 
very attractive in signal processing. When the measurements are 
corrupted by noise during signal acquisition and transmission, 
lbit-CS is usually modeled as minimizing a loss function with a 
sparsity constraint. The existing loss functions include the hinge 
loss and the linear loss. Though lbit-CS can be regarded as 
a binary classification problem because a one-bit measurement 
only provides the sign information, the choice of the hinge loss 
over the linear loss in binary classification is not true for lbit- 
CS. Many experiments show that the linear loss performs better 
than the hinge loss for lbit-CS. Motivated by this observation, 
we consider the pinball loss, which provides a bridge between 
the hinge loss and the linear loss. Using this bridge, two lbit-CS 
models and two corresponding algorithms are proposed. Pinball 
loss iterative hard thresholding improves the performance of the 
binary iterative hard theresholding proposed in (6| and is suitable 
for the case when the sparsity of the true signal is given. Elastic- 
net pinball support vector machine generalizes the passive model 
proposed in QT | and is suitable for the case when the sparsity 
of the true signal is not given. A fast dual coordinate ascent 
algorithm is proposed to solve the elastic-net pinball support 
vector machine problem, and its convergence is proved. The 
numerical experiments demonstrate that the pinball loss, as a 
trade-off between the hinge loss and the linear loss, improves 
the existing lbit-CS models with better performances. 

Index Terms —compressive sensing, one-bit, classification, pin- 
bail loss, dual coordinate ascent 
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I N analog-to-digital conversions and succeeding signal pro¬ 
cessing stages, quantization is an important issue. The 
extreme quantization scheme is that we only acquire one- 
bit for each measurement. This scheme only needs a single 
comparator and has many benefits in hardware implementation 
such as low power and a high rate. Suppose that we have a 
linear measurement system u G R" for a signal x G M". Then 
the analog measurement is u T x, and the one-bit quantized 
observation is only its sign, i.e., y = sgn(u T x). We set the 
sign of a non-negative number as 1 and that of a negative 
number as -1. Then the signal recovery problem related to 
one-bit measurements can be formulated as finding a signal x 
from the signs of a set of measurements, i.e., {u;, yi}^Li with 

Vi = sgn (uf x) . 

Let U = [m, u 2 ,..., u m ] and y = [y t , y 2 , ■ ■. , y m } T denote 
the measurement systems and the measurements respectively. 
It is easy to notice that signals with the same direction but 
different magnitudes have the same one-bit measurements with 
the same measurement systems, i.e., the magnitude of the 
signal is lost in this quantization. Therefore, we have to make 
an additional assumption on the magnitude of x. Without loss 
of generality, we assume ||x ||2 = 1. Then the meaning of one- 
bit signal recovery can be explained as finding the subset of the 
unit sphere ||x ||2 = 1 partitioned by random hyperplanes. In 
general, when the number of hyperplanes becomes larger, the 
feasible set becomes smaller, and the recovery result becomes 
more accurate. 

However, there may still be infinite many points in the 
subset, and we need additional assumptions on the signal 
to make it unique. The compressive sensing (CS, 12, B, 
0) tells us that if the signal is sparse, we may exactly 
recover the signal with much fewer measurements than the 
dimension of the signal 0, 0, ©■ This technique has been 
successfully applied in many fields. However, quantization is 
rarely considered in these applications. 

Motivated by the advantages of one-bit quantization and 
CS, one-bit compressive sensing (lbit-CS) is proposed in [4) 
and has attracted many attentions in recent years. lbit-CS tries 
to recover a sparse signal from the signs of a small number 
of measurements. Here the number of measurements can be 
larger than the dimension of the signal, which is different from 
regular CS. Same as in regular CS problems, the fundamental 
assumption for lbit-CS is that the true signal is sparse, i.e., 
only a few components of the signal are non-zero. Then, lbit- 
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CS is to find the sparsest solution in the feasible set, i.e., 


min j|x|| 0 

x6f“ 

s.t. yi= sgn(ufx), Vi = 1,2, (1) 

j|x||2 = l, 

where || • ||o counts the number of non-zero components. This 
problem is non-convex because of the fo-norm in the objective 
and the constraint ||x|| 2 = 1 . There are several algorithms that 
approximately solve 0 or its variants. See 0 , 0 , Q, (8). 

In 0 , we require that y, = sgn(ufx) holds for all 
measurements with the assumption that there is no noise in the 
measurements. However, in real applications, there is always 
noise in the measuring process, i.e., y t = sgn(ufx + Ei) 
with Ei 7 ^ 0. When the noise is small and sgn(ufx + Ei) = 
sgn(ufx), we can still recover the true signal accurately. 
The robustness to small noise is one of the advantages of 
lbit-CS. However, when the noise Ei is large, we may have 
sgn(ufx + Ei) 7^ sgn(ufx). In addition, there could be sign 
flips on components of y during the transmission. Note that 
sign changes because of noise happen with a higher probability 
when the magnitudes of true analog measurements are small, 
while sign flips during the transmission happen randomly 
among the measurements. With this difference in mind, the 
methods to deal with these two types of sign changes will 
also be different. 

With noise or/and sign flips, the feasible set of 0 excludes 
the true signal and can become empty. To deal with noise 
and sign flips, soft loss functions are used to replace the hard 
constraint, and it leads to robust lbit-CS models. The first 
robust model is given by 0. It utilizes the following hinge 
loss to measure the sign changes, 

ihinge(i) = max{0, t}. 

In the same paper, the squared hinge loss is also considered. 
The attempt in { 9 ] considers the following linear loss, 


^linear(^) — f- 


Via minimizing the hinge or the linear loss, some robust lbit- 
CS models and corresponding algorithms are proposed in 0 , 
0 , m, on and so on. These models will be reviewed 
in Section II. With these robust models, lbit-CS becomes 
more attractive. For example, it is shown in m and ED 
that under some conditions, signal recovery based on one-bit 
measurements is even better than conventional methods for 
nonlinear distortions and heavy noise. 

In lbit-CS, we only have sign information j/j C { — 1 ,+ 1 }, 
and hence recovering x can be regarded as a binary classifica¬ 
tion problem. In the binary classification field, the hinge loss 
is widely used, e.g., it is the loss function used in the classical 
support vector machine (SVM, El)- In El, El and other 
literature, it is shown that the hinge loss enjoys many good 
properties for classification such as classification-calibration, 
Bayes consistency, and so on. In traditional classification tasks, 
the linear loss is rarely considered. Recently, it is found in El 
that applying the linear loss in SVM is equal to the classical 
kernel rule [ j_8|, which enjoys computational effectiveness yet 
lacks of accuracy in many tasks. However, according to the 


experiments in 0 and El, the linear loss is quite suitable 
for lbit-CS, compared with the hinge loss. 

This unusual phenomena that the linear loss performs better 
than the hinge loss motives us to investigate the properties of 
lbit-CS. We will apply a pinball loss to establish recovery 
models for lbit-CS. Statistically, the pinball loss is closely 
related to the concept of quantile; see Ifl 9 i , l 20 l . liTTI for 
regression, and ll 22 l for classification. In this paper, we use 
the following definition for the pinball loss: 


L T (t) 


t, t > 0, 
—ft, t < 0. 


( 2 ) 


The pinball loss is characterized by the parameter r, and it 
is convex when r > — 1 . The hinge loss and the linear loss 
can be viewed as particular pinball loss functions with r = 0 
and r = — 1 , respectively. In other words, L T (t) provides 
a bridge from the hinge loss to the linear loss. The hinge 
loss is a good choice for regular classification tasks; and the 
linear loss shows good performance in lbit-CS. Hence, it is 
expected that a suitable trade-off between them can achieve 
better performance in lbit-CS. 

In this paper, we will discuss two models with the pinball 
loss minimization. First, based on the model given by 0, 
we propose a new model consisting of the pinball loss min¬ 
imization, a /'0-norm constraint, and the f^-norm unit sphere 
constraint. This problem is non-convex because of the 
norm and i 2 norm constraints. In order to solve this problem, 
pinball iterative hard thresholding (PIHT) is established and 
evaluated by numerical experiments. Second, we propose a 
convex model which contains the pinball loss, a £i-norm 
regularization term, and the f^-norm ball constraint. This 
model considers both the ^i-norm and the C-norm. So we 
name it as elastic-net pin-SVM (ep-SVM). When r = — 1 , 
it reduces to the model given by El To effectively solve 
ep-SVM, its dual problem is derived, and a dual coordinate 
ascent algorithm is given. This algorithm is shown to converge 
to a global optimum, and its effectiveness is illustrated by 
numerical experiments. 

The rest of this paper is organized as follows. A brief 
review on the existing lbit-CS methods is given in Section II. 
Section III introduces the pinball loss and proposes a pinball 
loss model with a /'o-norm constraint for lbit-CS. In Section 
IV, the elastic-net pin-SVM is discussed, and an algorithm is 
provided to solve it. Both proposed methods are evaluated on 
numerical experiments in Section V, showing the performance 
of the pinball loss in lbit-CS. A conclusion is given to end 
this paper in Section VI. 


II. Review on Ibit-CS Models 

lbit-CS was introduced in 2008 by 0 , and since then, it has 
attracted lots of attentions. Since the original model 0 is hard 
to minimize because of the Criiorm, which is nonsmooth and 
non-convex. One alternative way is to minimize the convex 
hull of fo, i-e, the fj -norm, and obtain the following lbit-CS 
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model: 

min ||x||i 

xeR" 

s.t. 2 /i(ufx) > 0, Vi = 1,2, (3) 

W2 = l. 

This model is given by 0, and an efficient heuristic is 
established in 0. 

Due to the fact that the unit sphere is non-convex, © is 
still a non-convex problem. In order to pursue the convexity, 
the non-convex sphere constraint ||x|| 2 = 1 is replaced by a 
convex constraint on the measurements in ( 23 ) , and a convex 
model is established as follows: 

min llxll 1 

x£R” 

s.t. yi(ufx) >0, Vi = 1,2, (4) 

i|U T x||i. = s, 

where s is a given positive constant. Note that © can be 
reformulated as a linear programming problem because the 
second constraint ||U T x||i = s becomes YliLi Vi( u T x ) = s 
if the first constraint is satisfied. However, the solution of © 
is not necessarily located on the unit sphere, hence one needs 
to project the solution onto the unit sphere. In fact, the solution 
is independent of s after projected onto the unit sphere. 

As we mentioned before, these models work only when 
there is no noise or the noise is too small to change the 
binary measurements, i.e., there is no sign changes in y. In 
real applications, noise in the measurements is unavoidable, 
and there could be sign flips on y during the transmission. 
Noise or/and sign flips can make lbit-CS problems © and 
© infeasible, and even feasible, the true signal is not in the 
feasible set. In other words, the related classification problem 
is non-separable, and even separable, the classifier is not 
accurate. To deal with noise and sign flips, one can use a 
soft loss function instead of the hard constraint. Since models 
with soft loss functions can tolerate the existence of noise and 
sign flips, they are called robust lbit-CS models. In ©, the 
following robust model is introduced: 

^ m 

min — V max |0, — Vi(ufx) j 

xer m 4—' 1 w J 

i= 1 

s.t. ||x|| 2 = 1, (5) 

ll x llo = K, 

where K is the number of non-zero components in the true 
signal. Binary iterative hard thresholding with a one-sided l\- 
norm (BIHT) is proposed to solve it approximately. The one¬ 
sided C|-norm is related to the hinge loss function in classical 
Ll-SVM m, whose statistical property in classification has 
been well studied and understood in (151 . (I6l . Il24l and (25). 
Similar to the link between Ll-SVM and L 2 -SVM, © can 
be modified into the following problem via replacing the one¬ 
sided £i-norm with a one-sided f 2 -norm: 

^ m 

min — V max { 0, — % (uf x) } 

xgR n m 1 1 

»=1 

s.t. ||x|| 2 = 1, (6) 

ll x llo = K, 


for which binary iterative hard thresholding with a one¬ 
sided f^-norm (BIHT-£ 2 , ©) is proposed. Modifications of 
BIHT/BIHT-/ 2 for sign flips are proposed by (TO) to improve 
their robustness to sign flips. However, this modification can 
not improve their robustness to sign changes because of noise 
in the measuring process. There are several ways to deal 
with sign changes because of noise, e.g., ( 26 ) uses maximum 
likelihood estimation; | 27 ]| uses a logistic function. 

Note that both problems © and © are non-convex, and 
the algorithms BIHT/BIHT-tL only approximately solve the 
problems. The convex model for robust lbit-CS using the 
linear loss proposed in © is: 

^ m 

min - Y yi( ufx) 

xgR" 

i= 1 

s.t. ||x|| 2 < 1 , ( 7 ) 

ll x lli < s > 

where s is a given positive constant. The unit sphere constraint 
||x|| 2 = 1 is relaxed to the unit ball constraint ||x|| 2 < 1, 
and the sparsity constraint ||x||o < K is replaced by the 
£1 constraint ||x||i < s. Moreover, the one-sided ^i-norm is 
replaced by a linear loss to avoid the trivial zero solution, and 
minimizing the linear loss can be explained as maximizing the 
correlation between y* and ujx. One can equivalently put the 
t-\ -norm in the objective function instead of in the constraint. 
The corresponding problem is given by IfTTl : 

^ m 

min ^ll x lli-yVi( u l x ) 

xeu n m z —' 

2=1 

s.t. ||x|| 2 < 1, (8) 

where // is the regularization parameter for the l\ -norm. In the 
rest of this paper, we call © Plan’s model and © the passive 
model. The latter comes from the name of the algorithm for 
® in im . Both problems © and © are convex, and there 
is a closed-form solution for ©. 

III. the Pinball Loss for 1bit-CS 
A. Pinball loss 

In robust lbit-CS models, the loss function plays an impor¬ 
tant role. According to the experiments in hd, Plan’s model 
and the passive model, which both minimize the linear loss, 
perform much better than BIHT/BIHT-£ 2 . However, the linear 
loss is quite rare in other classification tasks. To the best 
of our knowledge, among the existing classification methods, 
only the classical kernel rule fT8l . which enjoys computational 
effectiveness yet has bad classification accuracy generally, 
could be regarded as a support vector machine with the linear 
loss. This connection is recently discussed in m. 

To improve the performance from the one-sided L|-norm 
and the linear loss, we in this paper consider the pinball loss, 
which is defined in ©. Note that there are other equivalent 
formulations to define the pinball loss in (191 and ( 20 ). The 
parameter r is a key parameter for the pinball loss, and the 
one-sided £i-norm and the linear loss correspond to the cases 
t = 0 and r = — 1, respectively. 
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If r > — 1, L T (t) is convex. Thus, according to Theorem 2 
of m, one can verify that it is classification-calibrated, i.e., 
the function which minimizes the risk induced from L T (u) 
has the same sign as the Bayes rule. Furthermore, if r is 
non-negative, it is proved in ll22l that minimizing the pinball 
loss results in the Bayes rule. However, when r £ [—1, 0), 
the pinball loss is not consistent to the Bayes rule because 
L t {u ) with a negative r is not lower-bounded. Thus, in 
most classification problems, the performance of a negative 
r is not good, especially when r is around — 1. However 
the experiments on lbit-CS conflict with the common sense: 
r = — 1 leads to much better results than r = 0 , implying 
that lbit-CS has some special properties and motivating us to 
investigate the pinball loss with r £ [— 1 , 0 ]. 

B. Pinball iterative hard thresholding 

At this section, we replace the one-sided £\ norm in (0 with 
the pinball loss and expect that a better performance could be 
achieved. Specifically, we establish the following model, 

. m 

— ( c -^( u f x )) 

m 

i= 1 

l|x ||2 = 1, (9) 

ll x llo = K. 

Besides that a different loss function is considered, we also 
consider a bias term c > 0 in the loss. One can change c based 
on the measurement systems and even choose different c’s for 
different measurements. However, for the sake of simplicity, 
we choose the same c for all measurements. 

Minimization of a classification loss related to c — yfixijx.) 
makes data locate on the half-planes y* (uf x) > c. The margin 
between the two half-planes is given by 2c/||x||2- In lbit-CS, 

11 x 11 2 is fixed to be 1, that means the margin is 2c. Pursuing a 
large margin between two classes is helpful, especially when 
the data are noise-corrupted. In most SVM classifiers, c is set 
to be one. In BIHT, c = 0 and the loss function becomes the 
one-sided £i-norm of —yfi ufx). We will show the effect of 
c in robust lbit-CS after introducing the algorithm for ([91- 
Replacing the subgradient of the one-sided £\ norm in BIHT 
with that of the pinball loss, we obtain Pinball Iterative Hard 
Thresholding (PIHT) for The algorithm is summarized in 
Algorithm Q] where rjx stands for the best K'-term approxi¬ 
mation used in BIHT Sl¬ 
it is not hard to verify that Ug / gives a subgradient of 
(c — t/j( u F x )), which is parallel to Lemma 5 in 
©- Same as BIHT, the convergence of PIHT cannot be 
guaranteed neither. The user needs to give a maximum number 
of iterations. Though BIHT lacks of convergence analysis, it 
shows good performance in noiseless lbit-CS and is widely 
applied; see, e.g., Il28l . Il29l , ll30l . 

We in this stage give a simple example to investigate the 
performance of the pinball loss for different r and c values. 

Experiment 1. We randomly generate a 1000-dimensional 10- 
sparse vector x, i.e., there are 10 non-zero components in 
x. The non-zero components are randomly selected and their 
values follow the standard Gaussian distribution. We take 


min 

xeR n 

s.t. 


Algorithm 1: Pinball Iterative Hard Thresholding (PIHT) 


Set x°, K , ( max , a > 0, and l := 0 

repeat 

Calculate g ; as 

j _ f ~Vi, 2 /i(ufx ; ) < c, 
9i \ ryi, yfiufx. 1 ) > c; 

Update & l+1 = x 1 — aUg*; 

Calculate x i+1 = r/x (a i+1 ); 

l — l + l; 

Until l > (maxj 

Return 

x = x7i|x ; i| 2 . 


( 10 ) 


500 binary measurements with u, drawing from the standard 
Gaussian distribution as well. Here, we consider the noise-free 
case and 10% of the measurements are flipped. 

Algorithm [7] is used to recover the signal, and the result is 
denoted as x. The step-size a is chosen as suggested in Ml? and 
fixed. The average recovery error ||x — x || 2 of 100 runs is used 
to measure the recovery performance. In Fig\T\ the average 
recovery errors for different r and c values are plotted. The 
performance corresponding to the one-sided t\-norm and the 
linear loss is marked. Generally, we can conclude that PIHT 
with a suitable negative r improves the performance of BIHT. 
The performance of PIHT is not very sensitive to c when c > 
0.3, and we suggest c = 1, which coincides with the loss used 
in most SVM classifiers. 




Fig. 1. Average recovery error of PIHT for different r and c values. In this 
experiment, there is no noise in the measurements but 10% of the signs are 
flipped: (a) We set the bias term c = 0 and test different r values; (b) We 
set r = —0.2 and evaluate the performance of different c values. 


C. Properties of pinball loss and possible extensions 

From Fig[U we observe that a significantly better perfor¬ 
mance could be achieved by PIHT with a negative r. We do 
not claim which r value is the best, but one can find that the 
recovery performance is not monotonous with respect to r. 
Generally, the pinball loss with a negative r performs better 
than the hinge loss, i.e., the pinball loss with r = 0. This 
conflicts with the observation from many other classification 
tasks and motivates us to investigate special properties of lbit- 
CS. 

Consider g l calculated by (ITOl) for x l . When r = 0, i.e., 
the hinge loss is minimized, gl is not zero only for the 
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observations satisfying y,( ufx 1 ) < c, i.e., these observations 
are not strictly correctly classified. Here we consider lbit-CS 
as a binary classification problem, and by strictly, we mean 
that the analog measurement is not near zero. Because the 
hinge loss minimization is to minimize the summation of the 
distances to the decision boundary for the measurements that 
are not strictly correctly classified, the measurements that are 
strictly correctly classified do not contribute in the optimal 
solution. If we let c = 0 as in ( 0 , then the true signal is optimal 
when there is no noise or sign flips because the objective value 
is lower bounded by 0, and the objective value for the true 
signal is 0. When there are sign changes in the measurements, 
then the objective value for the true signal is not 0 any more, 
and only the measurements with inconsistent signs, i.e., the 
sign of yi is different from that of the u J x, contribute to the 
optimal solution for 0. Thus many measurements are useless 
in determining the optimal solution. 

The idea behind the linear loss and PIHT is to draw infor¬ 
mation from not only the incorrectly classified data, but also 
from the correctly classified ones. For example, when r < 0, 
g l calculated by (flOb encourages a larger yi(ujx l ) when 
yi{ufx 1 ) > c as well. Following this way, all measurements 
contribute to the final result, and the influence of sign flips 
and noise is weakened. 

If we can detect the measurements with sign changes accu¬ 
rately, we can remove or replace them with the opposite values. 
In no), Adaptive Outlier Pursuit (AOP) is designed to detect 
the sign flips during the transmission. Via adaptively detecting 
the measurements with sign flips, the performance of BIHT 
for lbit-CS is significantly improved. We can also combine 
AOP and PIHT to improve the performance of PIHT. The new 
method is called AOP-PIHT. Because during the iterations, 
AOP detects the sign flips more and more accurately, the effect 
of r should be decreased. We heuristically set r = 0.95 iout ro 
in AOP-PIHT, where To is the initial r value for the pinball 
loss and Z out is the counter of the outer loop in AOP-PIHT. 
We compare the performances of BIHT, PIHT, AOP-BIHT, 
and AOP-PIHT in the following two experiments. 

Experiment 2. We randomly generate a 1000-dimensional 15- 
sparse vector x in the same way as in Experiment 1. We 
compare PIHT, BIHT, AOP-BIHT, and AOP-PIHT for recover¬ 
ing the signal with different numbers of binary measurements 
(m = 200,300,..., 1500). Again there is no noise in the 
measurements but 10% of the signs are flipped. The average 
recovery errors of these four methods are shown in Fig\2\ 



Fig. 2. The performance of BIHT (green dotted line), PIHT (blue dashed 
line), AOP-BIHT (black dot-dashed line), and AOP-PIHT (red solid line) for 
different numbers of measurements. 


From Fig. [2] we can see that PIHT has a better performance 
than BIHT for all m before AOP is applied and both algo¬ 
rithms have very similar performances after AOP is applied. 
AOP is able to improve the performances of both algorithms, 
because it is able to detect most sign flips, and after the 
sign flips are corrected, the measurements are more accurate. 
However, before AOP is applied, PIHT is more robust than 
BIHT. 

As mentioned in the previous sections, there are mainly 
two different sources of sign changes. Though AOP is able 
to detect random sign flips, sign changes because of noise 
are more difficult to detect. The next experiment shows that 
PIHT is able to deal with sign changes mainly because of 
noise. In the following experiment, the performances of BIHT, 
PIHT, AOP-BIHT, and AOP-PIHT are evaluated for two cases. 
Firstly, we consider the case with noise only, and then we 
consider the case with both noise and sign flips. 

Experiment 3. We randomly generate a 1000-dimensional 15- 
sparse vector x in the same way as in Experiment 1. Then we 
take 800 analog measurements and add noise with different 
Signal to Noise Ratio (SNR) values (r n = 1,...,50) before 
the quantization. First we consider the case without sign flips, 
then we flip 10% of the measurements after the quantization. In 
/-7"0 the average recovery errors of BIHT, PIHT, AOP-BIHT, 
and AOP-PIHT are displayed by green dotted, blue dashed, 
black dot-dashed, and red solid line, respectively. 




Fig. 3. The performances of BIHT (green dotted line), PIHT (blue dashed 
line), AOP-BIHT (black dot-dashed line), and AOP-PIHT (red solid line) for 
noisy data with different levels of noise, (a) no sign flip; (b) 10% of the 
measurements are flipped. 

When noise is the main source of sign changes, e.g., no 
sign flips in Fig |3(a)| and low SNRs in Fig |3(b)| PIHT has 
the best performance among these four algorithms. AOP- 
BIHT and AOP-PIHT have similar performances, and their 
performances are worse than that of PIHT, i.e., AOP reduces 
the performance of PIHT when noise is the main source of 
sign changes. However, when sign changes happen mainly 
because of random sign flips, e.g., Fig|2] and high SNRs in 
Fig |3(b)[ AOP-PIHT has a better performance than PIHT. It 
confirms that the two different sources of sign changes have to 
consider differently and AOP is only suitable when the number 
of random sign flips is large. 

The main purpose of this paper is to introduce the pinball 
loss for lbit-CS. We leave modifications based on the pinball 
loss for the further work. In general, many advanced tech¬ 
niques for BIHT are also applicable for PIHT. Since minimiz¬ 
ing the pinball loss instead of the hinge loss could improve 
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performances, one can naturally expect that modifications on 
PIHT, e.g., AOP-PIHT, will achieve better performances. 

IV. Elastic-net Pin-SVM 

A. Primal problem 

In the above section, we replaced the one-sided l\ norm in 
BIHT with the pinball loss and established PIHT for robust 
lbit-CS. Numerical experiments illustrate that PIHT performs 
better than BIHT. However, problem © that PIHT solves is 
non-convex, and there is no guarantee that PIHT converges 
to the global optimum of (fyji. In this section, we propose a 
convex model using the pinball loss and derive an algorithm 
to solve it. Specifically, the convex problem is 

^ m 

sfe + (11) 

s.t. ||x|| 2 < 1. 

Here p is a parameter to balance the regularization term ||x||i 
and the data loss term. We call ( ITTI) as an elastic-net pin-SVM 
(ep-SVM) because it involves both i^-norm and £ 2 -norm. 

When t = — 1, the pinball loss becomes the linear loss, 
and (flTT ) reduces to the passive model ©, for which there 
is a closed-form solution CD. According to the experience 
on other classification tasks and the performance in FigJT] we 
expect that a suitably selected r may improve the performance. 
However, for all r greater than — 1, analytic solutions are not 
available and we need an efficient algorithm. We will introduce 
a coordinate ascent method to solve CD- 


B. Dual problem 

In order to obtain the dual problem of ifTTl) . we reformu¬ 
late (ITTb into the following problem 

^ m 

min P(x,e,z) := p ||e||i -|- Y L T ( Zi ) + t(x) 

x,e,z m z ' 

Z=1 ( 12 ) 

s.t. x = e, 

c - 2 /i(ufx) = Zi, i = 1,2,... ,m, 


where t(x) is the indicator function defined as 

t(x) = { °' ■' l|x “ 2 S *' 

[ +oo, else. 

The corresponding Lagrangian is 

^ m 

£(x, e, z, p, £) =/i||e||i + — V L T ( Zi ) + t(x) 

m L ' 


i= 1 


+ ^ T (x-e) + ^^(c-t/i(ufx) - Zi). 


i= 1 


Then we can minimize C with respect to the primal variables 
{x, e, z} and obtain the dual problem of dT2] » as below, 


max 


s.t. 


D(p,0 :=cJ2^ 

i=1 


Y - p 

»= 1 2 


ll/3||oo < p, 

r 1 

-< & < —, i = 1,2 ,... ,m. 

m m 


(13) 


Assume that we solve the dual problem and obtain optimal 
ft* and £*, then we can find the optimal x* for (fTTI) as follows: 

1) If Y,Zi CiViU-i - P* ^ 0, i.e., HEHiCl/iWlU > M. 

the optimal x* can be obtained as 


x * = 


Ki=1 


E 

i— 1 




2) If EHi ZiViUi - P* = o, i.e., IIEHiCyiWlI^ < It- 

The optimal x* may not be unique, and all x* that 
satisfies 


||x*|| 2 <l, (14a) 

x*=0, if |/3*| <p, (14b) 

x* > 0, if p* = p, (14c) 

x* < 0, if P* = -p, (14d) 

c - 2/*( u f x *) > 0, if C = 1/m, (14e) 

c - yi( ufx*) < 0, if Ci = -r/rn, (14f) 


c~Vi{ ufx*)=0, if C G {-r/m, 1/m), (14g) 

are optimal. 

Remark 1: If ||EEi E < M. we ha ve C = 1/m. 
Then it is easy to check that x* = 0 is optimal. This 
generalizes the result for the passive model CD Lemma 1]. 
When r = — 1, there is no constraints (I14eb -( p~4g} >. and any 
x* satisfying (fl4al>-([l4dll is optimal. 

Let us define two hypercubes 



If A f) B = 0, then the optimal x* will always be on 
the unit sphere. Even if C 0, we may still have 

HE™ l CVi n /\ > M when c > 0, and the optimal x* is on 
the unit sphere. However, if c = 0, the optimal dual objective 
is 0 when A (T B 0, and the the primal objective becomes 
zero when x = 0, so x* = 0 is optimal for the primal problem. 

In order to get the optimal x* on the unit sphere, we can 
choose smaller p because smaller p implies smaller B and 
A n B becomes smaller. 


C. Dual coordinate ascent algorithm 

In the dual problem (fl~3l> . the constraints are separable and 
we can apply the coordinate ascent method efficiently. The 
subproblems are: 

1) Pj -subproblem: D(P,£) is separable with respect to /?, 
so Pj can be computed in parallel via 


Pj = max 


—p, min 




(15) 


2) <4,-subproblem: Let us consider C = <4, + d, . Then it 
becomes the following optimization problem on dp 


max cdi — 


IJi u, di 


m 

Y - p 

2=1 


2 


(16) 
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Denote w = Ya=i U u i — f3. Problem © becomes 
max i cdi - ^||uj|||d 2 + 2y z ujwdi + ||w|| 




The optimal solution d* can be calculated analytically as 
following: 

• If ||||2 < c, the objective function is non-increasing. 
We have that d* = 1/m — & is optimal, i.e., £* = 1 /to. 

• If ||U; ||2 > c, we have 


d* = max <{ - f/i, min I — - d t 

m I m 


(17) 


where 


di = 


~Bd + \J — AAdCd 


2 A d 


(18) 


with 


Ad 

B d 

C d 


^lll( 


Ui|| 2 - c 


2(11 u* 11 2 — c )y i u i w, 


= (ufw) 2 -c 2 


Summarizing the previous discussion, we give the dual 
coordinate ascent method for ( ITTb in Algorithm [ 2 ] 


Algorithm 2: Dual coordinate ascent for ep-SVM 

Set l = 0,/3° = 0 nxl ,£° = -^l mx i; 

Calculate w = Yh=i - fc 

repeat 

for i = 1,2,..., m do 

if c > ||uj|| 2 then 
I d* = — - £ L 

I a i m 9 

else 

] Calculate di by (IT 8 l> and d* by (fTTb : 
end 

if d* f 0 then 

w = w + yiMid*\ 

C- +1 = d + d*\ 

end 

end 

Calculate /3 l+1 by (fl5l> : 
l:=l + 1 ; 
until £ l = ^“ 1 ; 
if ||w|| 2 > 0 then 

I II w II2 

else 

| Find x that satisfies (©; 

end 


When r = — 1, the algorithm ends in one iteration because 
= 1 /to for all i. Thus, there is an analytical solution for 
the passive model. For other r values, there is no analytical 
solution. However, the next theorem states that the output x* 
of Algorithm [ 2 ] is optimal for dTil . 

Theorem 1: The dual coordinate ascent for ep-SVM (Algo¬ 
rithm |21) converges to an optimal solution of (ITTb . 

Proof: The optimality condition for ( ITTI ) is 


• If ||x ||2 = 1, there exists £ satisfying 

{ = 1 /to if c — yi(u[x) > 0 

e [—r/ to, 1/to] if c — yi(uf x) = 0 (19) 

= —t/to if c - yj(ufx) < 0 

such that p||x||i - YaLi & 2 /i(ufx) > 0 . 

• If ||x ||2 < 1, there exists £ satisfying 

{ = 1 /to if c — yi(ufx) > 0 

e [-t/to, 1 /to] if c - Vil ufx) = 0 ( 20 ) 

=-r/m if c — Vi{ufx) < 0 

such that p||x||i - & 2 /i(ufx) = °- 

Then we show that the output of Algorithm [2] satisfies the 
optimality condition. 

Case 1 (w 0): We have ||x*|| = 1 and the algorithm shows 
that {(3*} and {/*} are coordinate maximum of (fl3l >. Thus we 
have 

• If £* = 1 /to, we have yi/rij x*) < c. 

• If £* = —t/to, we have yi fij x*) > c. 

• If £* G (—t/to, 1/to), we have yp ufx*) = c. 

• If I (E™ 1 C, < P, we have x* = 0 . 

• If (Eili C, > M. we have > 0. 

• If (E™ 1 ZtViUijj < ~!P we have a:* < 0 . 

Therefore, £* satisfy (fl9l >. and we have 

x * - 

which means that x* is optimal. 

Case 2 (w = 0): We have ||x*|| < 1, and {/?*} and {/*} 
being coordinate maximum of dT3l> tells us that 

• If £* = 1/to, we have ||2/iU *|| 2 < c. 

• If £* = -t/to, we have ||j/iUi || 2 > c. 

. If £* G (-t/to, 1/to), we have ||j/iUi || 2 = c. 

• If I (EEi ZtViUi)] I < Ah i- e - |/3j I < IP we have a:* = 0 . 
• If (E™ 1 = Ah i- e - /3j = A*, we have a;* > 0 . 

• If (EHi £* 2 /*u*)j = “At, i-e-, P* = -ft, we have a:* < 0 . 

Therefore, for any x* satisfying (IT4l) . £* satisfies (l20l >. and we 
have 


J2 i=1 £*S/i( u f x *) = (5Z. i=1 x * = At|| x *||i, 

which means that x* is optimal. ■ 

Remark 2: Both the proof of Theorem [7] and Algorithm [2] 
suggest that if c > ||uj|| 2 for all i, then = 1 /m, and ep- 
SVM reduces to the passive model no matter what t is. It 
happens because c— y-i/uj x) > 0 if ||x || 2 < 1. Therefore, 
we choose c to be much smaller than most 11 u* | [ 2 . In all the 
experiments in this paper, u, has the same dimension (n = 
1000), and they are generated in the same way. So we choose 
the same c. 

In practice, we can set a maximum number of iterations Z max 
and choose ||£ J — a/ _1 ||oo < 6 as the stopping criterion. Here 
S is a small positive number. In the following experiments, we 
set Z max = 100 and <5 = (1 + t)/(10to). 












D. Selection of t, c, and p 

Though the passive model has an analytical solution, the 
linear loss is not a good classification loss in regular classi¬ 
fication problems and lbit-CS. Thus we choose the pinball 
loss with r > — 1. In order to evaluate the improvement using 
ep-SVM with different r values, we consider an experiment 
similar to Experiment 1. 

Experiment 4. We randomly generate a 1000-dimensional 15- 
sparse vector x in the same way as in Experiment 1. Then 
we take 300 binary measurements and flip 10% of them. Ep- 
SVM (ED with different r and c values are evaluated. For 
the regularization coefficient p, we choose p = ^ J Io ^ ra , as 
suggested in HI IV . The experiments are repeated 100 times 
and the average recovery error is plotted in Eig\4\ 




Fig. 4. Average recovery error of ep-SVM for different r and c values, (a) 
Set the bias term c = 1 and evaluate the performances of different r values; 
(b) Set t = —0.7 and evaluate the performances of different c values. 

This experiment has similar results as Experiment 1. The 
recovery error is not monotonous with respect to t, and a 
suitable r value, e.g., r = —0.7 in this experiment, leads to a 
better result. The performances of different c values in Fig |4(b)| 
suggest that c = 1 is a good choice for this measurement 
system. 

It is possible that different p values are s uitable for different 
r values. In the following, we set p = C^J and consider 
the performances with different C and r values. The corre¬ 
sponding average recovery error is shown by a contour map 
in Fig0 The curves represent the level sets, and the colors 
stand for the recovery error. Generally, when r = —1, the 
suitable C is around 1, which is also the suggestion of HD- 
If a larger r is used, the corresponding suitable C is smaller. 
The relationship between r and p is problem-dependent. For 
practical use, we suggest r = —0.5 and p = 0.7a for 
ep-SVM. In the numerical experiments, we will evaluate other 
parameter values as well. 

V. Numerical Experiments 
A. Known sparsity 

In the previous sections, we introduced the pinball loss for 
robust lbit-CS and established two models and two corre¬ 
sponding algorithms. Several simple experiments illustrate that 
the pinball loss minimization helps us improve the recovery 
performance for lbit-CS. In this section, we further evaluate 
the performance of the pinball loss in more experiments with 
different noise levels and different numbers of measurements. 



C 


Fig. 5. Contour map of the average recovery error of ep-SVM for different 


r and // values, where // = 
same as those in Fig|d] 


C\J la f " . The experiment parameters are as the 


To highlight the main purpose of this paper, i.e., using a 
new loss function for lbit-CS, we do not consider advanced 
techniques such as AOP. As shown in Figj2j suitably applying 
those techniques for pinball loss minimization can further 
improve the performance. 

First, assume that the sparsity is known in advance. Then (0 
and (|9| are applicable to recover the signal. We solve them 
by BIHlQ and Algorithm Q] respectively. Note that there is 
no stopping criterion for both BIHT and PIHT. We set the 
maximum number of iterations to 500 for both of them. 
Though the experiment shown in FigJT] implies that PIHT 
with r = —0.2 is a good choice. We also evaluate the 
performance for r = —0.1 and —0.4. The data are generated 
following the same way of Experiments 1—4: we have m one- 
bit measurements and try to recover a n dimensional signal 
with A'-sparsity. The sign flip ratio is Vf and the SNR in 
measurements is r n . All the results below is the average 
value for repeating 100 times. The experiments are done with 
Matlab 2013a on Core i5-1.80GHz, 4.0GB. The source code 
for Algorithm [Hand Algorithm [2] can be found on the authors’ 
homepages. 

To test the performance of BIHT and PIHT for different 
numbers of measurements, we select n = 1000, ry = 10%, 
r n = oo, AT = 20 and vary m from 100 to 5000. Fig[6] 
displays the performances for BIHT and PIHT with different 
r values. Compared with BIHT, using a negative r improves 
the performance significantly with similar computational time. 
The good performance of r = —0.2 is again confirmed. 




(a) (b) 

Fig. 6. The performances of BIHT (blue dashed line) and PIHT with r = 
—0.1 (green dotted line), r = —0.2 (red solid line), and r = —0.4 (black 
dot-dashed line). In this experiment, n = 1000,ry = 10%, K = 20, and 
r n = oo: (a) recovery error vs. m; (b) computational time vs. m. 

1 http://perso.uclouvain.be/laurent.jacques/index.php/Main/BIHTDemo 


























9 


In the previous experiment, we assumed that there is no 
noise and considered only sign flips. Next we consider the 
performance of PIHT for different SNRs with a fixed sign 
flip ratio. The average recovery error is shown by Fig® The 
performances again suggest r = —0.2 for this measurement 
system. 



signal-to-noise “ratio r n 


Fig. 7. The recovery error for different noise-levels with n = 1000, m = 
800, rf = 10%, and K = 20. The result of BIHT is shown by blue dashed 
line, PIHT with r = —0.1 is shown by green dotted line, PIHT with r = 
—0.2 by red solid line, and PIHT with r = —0.4 by black dot-dashed line. 

When the sparsity of the true signal is not known in 
advance yet we have an estimation of the sparsity, we may 
still apply BIHT and PIHT with this estimation. However, the 
performance will be reduced, as shown in the next experiment. 
In order to test the performances of PIHT with different 
estimations of the sparsity of the signal, we fix the sparsity of 
the true signal as 20, but use different K values in BIHT and 
PIHT. In Fig® we can observe that if the estimation on the 
sparsity is accurate, PIHT gives good recovery performance. 
But if the gap between the estimation and the real sparsity is 
large, the performance of PIHT becomes bad. This experiment 
shows that an accurate estimation is necessary for PIHT. 



sparsity parameter in BiHT “and PIHT 

Fig. 8. The recovery error for different sparsity estimations. The sparsity of 
the true signal is 20, other parameters for this experiment are n = 1000, m = 
800, r n = oo, and rj = 10%. The result of BIHT is shown by blue dashed 
line, PIHT with r = —0.1 is shown by green dotted line, PIHT with r = 
—0.2 by red solid line, and PIHT with r = —0.4 by black dot-dashed line. 

B. Unknown sparsity 

In general, the sparsity of the true signal is not known or 
the true signal is only approximately sparse. In these cases, 
the performance of PIHT is reduced if the estimation is not 
correct, and we can consider Plan’s model, the passive model, 
or the proposed ep-SVM. Plan’s model and the passive model 
use the same loss function, and they have similar recovery 
errors, according to the numerical study in HD- Since there is 
an efficient algorithm for the passive model, we in this paper 
only compare ep-SVM with the passive model. 

In the passive model and ep-SVM, there is a r egular ization 
coefficient p. As suggested by ifTTl . we set p = w Lis™ for the 


passive model. For ep-SVM, the suitable p value depends on 
r, as illustrated by Fig® Heuristic ally, we let p = 
where C is given below. 


T 

-0.4 

-0.5 

-0.7 

-0.9 

-1.0 

a 

0.6 

0.7 

0.8 

0.9 

1.0 


Let n = 1000, I\ = 20 ,r n = oo,r/ = 10%, and vary 
the number of measurements from 200 to 2000. The average 
recovery errors and the computational time for different r 
values are listed in Table Q] The smallest average recovery error 
for each m is highlig hted in bold font. The results imply that 
r = —0, 5, p = 0.7 is a promising choice. Though there 

is no analytic solution when r > — 1, the computational time 
in Table Q] shows that Algorithm ® is very fast. Furthermore, 
we test the performance of Algorithm ® for noise-corrupted 
data ( r n = 10), and the corresponding results are reported in 
Table UD 

From the results listed in Table [Q and ED we observe 
that minimizing the pinball loss can improve the accuracy 
of the linear loss. The computational time is monotonically 
increasing with respect to r, and generally Algorithm ® is 
efficient. In practice, we suggest r = —0.5 and p = 0-7 
for ep-SVM. In the following, we use this parameter set for 
different sparsity levels and different SNRs. The results are 
shown in Fig® from which one can observe the improvement 
of pinball loss minimization. 




(a) (b) 

Fig. 9. The performances of the passive model (blue dashed line) and 
ep-SVM (red solid line), (a) recovery accuracy vs. SNR (n = 1000, m = 
2000, if = 20. r j = 10%); (b) recovery accuracy vs. sparsity of the true 
signal (n = 1000, m = 500, r n = 20, rf = 10%). 


VI. Conclusion 

In lbit-CS, one recovers a signal from a set of sign 
measurements. It can also be regarded as a binary classifi¬ 
cation problem, for which the hinge loss enjoys many good 
properties. However, the linear loss performs better than the 
hinge loss in robust lbit-CS. Thus, a trade-off between them 
is expected to share their good properties and improve the 
recovery performance for lbit-CS. We introduce the pinball 
loss, which is a trade-off between the hinge loss and the linear 
loss. We proposed two model for minimizing the pinball loss 
and the two algorithms to solve them. PIHT improves the 
performance of the BIHT proposed in [® and is suitable for 
the case when the sparsity of the true signal is given. Ep- 
SVM generalizes the passive model proposed in DU and is 
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m 

200 

350 

500 

650 

800 

1100 

1400 

1700 

2000 

r = -0.40 

error 

0.889 

0.599 

0.498 

0.437 

0.398 

0.338 

0.292 

0.260 

0.245 


time (ms) 

74.8 

366 

375 

374 

381 

418 

513 

562 

634 

r = -0.50 

error 

0.850 

0.582 

0.495 

0.430 

0.390 

0.329 

0.287 

0.251 

0.235 


time (ms) 

141 

298 

295 

303 

332 

386 

471 

579 

629 

r = -0.70 

error 

0.796 

0.602 

0.518 

0.450 

0.405 

0.344 

0.301 

0.269 

0.242 


time (ms) 

135 

169 

176 

205 

249 

304 

385 

414 

456 

r = -0.90 

error 

0.820 

0.633 

0.540 

0.479 

0.430 

0.368 

0.324 

0.289 

0.258 


time (ms) 

63.0 

86.8 

108 

131 

152 

193 

248 

278 

311 

r = -1.00 

error 

0.837 

0.657 

0.558 

0.504 

0.451 

0.392 

0.345 

0.309 

0.274 

(passive algorithm) 

time (ms) 

5.20 

6.23 

7.90 

10.0 

10.5 

13.9 

13.5 

15.1 

17.7 


TABLE I 

Recovery error and computational time of ep-SVM 


m 

200 

350 

500 

650 

800 

1100 

1400 

1700 

2000 

t = -0.40 

error 

0.949 

0.676 

0.549 

0.471 

0.418 

0.361 

0.315 

0.282 

0.252 


time (ms) 

50.6 

297 

418 

409 

446 

467 

537 

591 

709 

r = -0.50 

error 

0.906 

0.648 

0.541 

0.460 

0.404 

0.348 

0.305 

0.274 

0.243 


time (ms) 

56.7 

317 

341 

332 

375 

425 

503 

561 

637 

r = -0.70 

error 

0.821 

0.663 

0.563 

0.479 

0.418 

0.361 

0.320 

0.285 

0.252 


time (ms) 

146 

169 

185 

235 

267 

332 

405 

438 

470 

r = -0.90 

error 

0.840 

0.689 

0.596 

0.507 

0.442 

0.384 

0.343 

0.305 

0.269 


time (ms) 

66.5 

90.9 

111 

137 

172 

198 

243 

275 

304 

r = -1.00 

error 

0.855 

0.707 

0.622 

0.534 

0.462 

0.405 

0.364 

0.324 

0.287 

(passive algorithm) 

time (ms) 

2.50 

4.60 

6.15 

7.57 

8.91 

11.5 

13.4 

16.5 

19.0 


TABLE II 

Recovery error and computational time of ep-SVM for noise-corrupted data ( r n = 10). 


suitable for the case when the sparsity of the true signal is not 
given. A fast dual coordinate ascent algorithm is proposed to 
solve ep-SVM, and its convergence is proved. The numerical 
experiments demonstrate that the pinball loss, as a trade¬ 
off between the hinge loss and the linear loss, improves 
the existing lbit-CS models with better performances. In the 
future, we will investigate other advanced methods based on 
the pinball loss minimization. The related statistical properties 
in view of learning are also interesting for study. 
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